Deterministic-Probabilistic Models For Partially Observable Reinforcement Learning Problems
نویسندگان
چکیده
In this paper we consider learning the environment model in reinforcement learning tasks where the environment cannot be fully observed. The most popular frameworks for environment modeling are POMDPs and PSRs but they are considered difficult to learn. We propose to bypass this hard problem by assuming that (a) the sufficient statistic of any history can be represented as one of finitely many states and (b) this state is given by a deterministic map from histories to the finite state space. This finite set of states can be interpreted as the state space of an MDP which can then be used to plan. Now the learning problem is to estimate this deterministic history-state map. One of the earliest approaches in this direction is McCallum’s USM algorithm. Our work can roughly be understood as extending this general idea by replacing prediction suffix trees, used in USM, with deterministic-probabilistic finite automata from learning theory. In this paper we describe our model, derive a pseudo-Bayesian inference criterion, and show its consistency. We also describe a heuristic algorithm that uses the criterion to learn the models, along with experiments showing its efficacy.
منابع مشابه
Learning Partially Observable Action Models
In this paper we present tractable algorithms for learning a logical model of actions’ effects and preconditions in deterministic partially observable domains. These algorithms update a representation of the set of possible action models after every observation and action execution. We show that when actions are known to have no conditional effects, then the set of possible action models can be...
متن کاملLearning Partially Observable Deterministic Action Models
We present the first tractable, exact solution for the problem of identifying actions’ effects in partially observable STRIPS domains. Our algorithms resemble Version Spaces and Logical Filtering, and they identify all the models that are consistent with observations. They apply in other deterministic domains (e.g., with conditional effects), but are inexact (may return false positives) or inef...
متن کاملProbabilistic and Decision-Theoretic User Modeling in the Context of Software Customization
Research in the field of user modeling has aimed to supersede the current “one-size-fits-all” trend in software development that forces users to change their behaviour according to the preprogrammed functions. This paper discusses aspects of user modeling and its relevance to software customization. In particular, we focus on user modeling techniques that utilize probabilistic and decision-theo...
متن کاملReinforcement Learning in Partially Observable Markov Decision Processes using Hybrid Probabilistic Logic Programs
We present a probabilistic logic programming framework to reinforcement learning, by integrating reinforcement learning, in POMDP environments, with normal hybrid probabilistic logic programs with probabilistic answer set semantics, that is capable of representing domain-specific knowledge. We formally prove the correctness of our approach. We show that the complexity of finding a policy for a ...
متن کاملReinforcement Learning Algorithm for Partially Observable Markov Decision Problems
Increasing attention has been paid to reinforcement learning algo rithms in recent years partly due to successes in the theoretical analysis of their behavior in Markov environments If the Markov assumption is removed however neither generally the algorithms nor the analyses continue to be usable We propose and analyze a new learning algorithm to solve a certain class of non Markov decision pro...
متن کامل